On Evaluating Agent Performance in a Fixed Period of Time

نویسنده

  • José Hernández-Orallo
چکیده

The evaluation of several agents over a given task in a finite period of time is a very common problem in experimental design, statistics, computer science, economics and, in general, any experimental science. It is also crucial for intelligence evaluation. In reinforcement learning, the task is formalised as an interactive environment with observations, actions and rewards. Typically, the decision that has to be made by the agent is a choice among a set of actions, cycle after cycle. However, in real evaluation scenarios, the time can be intentionally modulated by the agent. Consequently, agents not only choose an action but they also choose the time when they want to perform an action. This is natural in biological systems but it is also an issue in control. In this paper we revisit the classical reward aggregating functions which are commonly used in reinforcement learning and related areas, we analyse their problems, and we propose a modification of the average reward to get a consistent measurement for continuous time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating the Efficiency of Firms with Negative Data in Multi-Period Systems: An Application to Bank ‎Data

Data Envelopment Analysis (DEA) is a mathematical technique to evaluate the performance of firms with multiple inputs and outputs. In conventional DEA models, the efficiency scores of Decision Making Units (DMUs) with non-negative inputs and outputs are evaluated in a special period of time. However, in the real world there are situations wherein performance of firms must be evaluated in multip...

متن کامل

Macroeconomic Determinants of Manufacturing Sector Performance in Nigeria: an Asymmetric Non-Linear Approach

This study investigates the responsiveness of manufacturing sector performance to major macroeconomic determinants in Nigeria, covering the period between 1981 and 2018. It contributes to attendant literature by examining the asymmetric impact of each of the macroeconomic variables, including GDP per capita, exchange rate, inflation rate, interest rate proxied by prime lending rate, and gross f...

متن کامل

Evaluating the Performance of an Ambidextrous Bank Using an Agent-based Modeling Approach: A Case Study of Sepah Bank

Banks are the financial institutions that collect assets from various sources and allocate them to the sectors that require liquidity. Therefore, banks are an inherent element in the system of every country. As private banks enter financial markets, the demand for diverse banking services increases dramatically. Banks seek to use various techniques to improve their performance in attracting cus...

متن کامل

Evaluating the Effectiveness of Using Sport and Traditional Games at the Higher Military Educational Establishments in a Pandemic and Post-pandemic Period

Background. The cadets experienced particular difficulties in a pandemic period because due to isolation and faced a number of unanticipated challenges like stress, anxiety, and low learning outcomes. Objectives: The aim of the study was to evaluate the effectiveness of sport and traditional games to improve cadets’ learning performance and motivation to learning activities and future service ...

متن کامل

Evaluating the Performance of Hot Mix Asphalt with Reclaimed Asphalt Pavement and Heavy Vacuum Slops as Rejuvenator

Due to the high price of crude oil, and consequently asphalt binder, the application of Reclaimed Asphalt Pavement (RAP) in pavement technology is widely considered. The present paper is the result of a laboratory research which was carried out to investigate the effects of adding a rejuvenating agent to Hot Mix Asphalt (HMA) with RAP. To this end, test samples comprised of Aged Asphalt Binder ...

متن کامل

One-for-One Period Policy and its Optimal Solution

In this paper we introduce the optimal solution for a simple and yet practical inventory policy with the important characteristic which eliminates the uncertainty in demand for suppliers. In this new policy which is different from the classical inventory policies, the time interval between any two consecutive orders is fixed and the quantity of each order is one. Assuming the fixed ordering cos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009